Coronavirus Genomes and Unique Mutations in Structural and Non-Structural Proteins in Pakistani SARS-CoV-2 Delta Variants during the Fourth Wave of the Pandemic

Genomic epidemiology of SARS-CoV-2 is imperative to explore the transmission, evolution, and also pathogenicity of viruses. The emergence of SARS-CoV-2 variants of concern posed a severe threat to the global public health efforts. To assess the potential consequence of these emerging variants on public health, continuous molecular epidemiology is of vital importance. The current study has been designed to investigate the major SARS-CoV-2 variants and emerging mutations in virus structural and non-structural proteins (NSP) during the fourth wave in September 2021 from the Punjab province of Pakistan. Twenty SARS-CoV-2 positive samples have been collected from major cities were subjected to next-generation sequencing. Among the 20 whole genomes (GenBank Accession SRR16294858-SRR16294877), 2 samples failed to be completely sequenced. These genome sequences harbored 207 non-synonymous mutations, among which 19 were unique to GISAID. The genome sequences were detected: Delta 21I, 21J variants (B.1.617.2). Mutation’s spike_F157del, spike_P681R, spike_T478K, spike_T19R, spike_L452R, spike_D614G, spike_G142D, spike_E156G, and spike_R158del have been detected in all samples where K1086Q, E554K, and C1250W were unique in spike protein. These genomic sequences also harbored 129 non-synonymous mutations in NSP. The most common were NSP3_P1469S (N = 17), NSP3_A488S (N = 17), NSP3_P1228L (N = 17), NSP4_V167L (N = 17), NSP4_T492I (N = 17), NSP6_T77A (N = 17), NSP14_A394V (N = 17), NSP12_G671S (N = 18), and NSP13_P77L (N = 18). The mutation, F313Y in NSP12, detected in the current study, was found in a single isolate from Belgium. Numerous other unique mutations have been detected in the virus papain-like protease (NSP3), main protease (NSP5), and RNA-dependent RNA polymerase (NSP12). The most common non-synonymous mutations in the spike protein were subjected to stability analysis, exhibiting a stabilizing effect on structures. The presence of Delta variants may affect therapeutic efforts and vaccine efficacy. Continuous genomic epidemiology of SARS-CoV-2 in Pakistan may be useful for better management of SARS-CoV-2 infections.


Background
The deadly SARS-CoV-2 (Severe Acute Respiratory Syndrome Coronavirus-2) posed a major effect on public health, disrupting the global healthcare system. SARS-CoV-2 has the ability to exhibit a high mutation frequency due to the presence of single-stranded RNA, posing major issues to public health. Molecular epidemiology is required to study the virus evolutionary stages in specific geographic locations and also to establish how the virus may affect the vaccine efficacy and drug response.

Processing of Samples
The proper specimen collection procedure under laboratory testing for coronavirus disease  interim guidance by WHO was followed for SARS-CoV-2 samples collection and processing. The nasopharyngeal swabs were collected in 2 mL of vial transport medium (LinkGen, Taizhou, Jiangsu, China) and placed at 4 • C for further use. All these samples were confirmed with TaqPath™ COVID-19 CE-IVD RT-PCR Kit (Thermo Fisher, Waltham, MA, USA).

RNA Extraction, Quantification and cDNA SYNTHESIS
RNA extraction was performed from nasopharyngeal swabs using a standard volume of 200 ul by MagMAX™ Viral/Pathogen Nucleic Acid Isolation Kit (Thermo Fisher, Waltham, MA, USA) and (Applied Biosystem, Waltham, MA, USA). The extracted RNA was stored at −80 • C for downstream application. The real-time quantification and copy number determination of nucleic acid was performed using TaqPath™ 1-Step RT-qPCR Master Mix (Thermo Fisher, Waltham, MA, USA) and the relevant copy number was determined according to the CT and control reactions in quantification. Only samples with relevant CT of 18-28 values were chosen. Using a minimum input amount of 10 ng/uL of the RNA sample, RNA to cDNA reversed transcription was performed with the SuperScript™ VILO™ cDNA Synthesis Kit (Thermo Fisher, Waltham, MA, USA).

Library Preparation
The library was prepared manually using Ion AmpliSeq™ Library Kit (Thermo Fisher, Waltham, MA, USA). Amplification was performed with a 2-pool Ion AmpliSeq™ SARS-CoV-2 Research Panel (Thermo Fisher, Waltham, MA, USA). The library was partially digested and ligated with Ion Xpress™ Barcode Adapters 1-96 Kit (Thermo Fisher, Waltham, MA, USA). Library purification and quantification steps were carried out with GenDx-AMPure®XP beads (GENDX, Utrecht, The Netherlands) and TaqMan™ Quantitation Kit (Thermo Fisher, Waltham, MA, USA). The libraries were diluted to a final concentration of 40 pM, loaded on Thermo Fisher Scientific Ion Chef™ Instrument for PCR, and loaded on Ion S5 530 and 510 (Thermo Fisher, USA).

Whole Genome Sequencing and Data Analysis
Sequencing was accomplished on Ion GeneStudio™ S5 System at LabGenetix, Lahore, Pakistan. The fastQ base sequence file quality was assessed using the FastQC (v0.11.8). Trimmomatic tool (v0.39) was used to remove the low-quality reads (Q < 30) and index adapter sequences were utilized to improve sample multiplexing. Sequenced reads were aligned with the reference (NC 045512, using Burrows Wheeler Aligner (BWA, v0.6). The duplicated PCR reads were removed with Picard Tools (v2.21.6). Mapping problems due to the presence of small Indels were removed with a Genome Analysis Toolkit, "Realign-erTargetCreator" and "InDelRealigner" (GATK v. 3.3.0) to analyze the mapped read. To improve the accuracy of variant calling, GATK tool "HaplotypeCaller" was applied for realignment of sequenced reads via local de-novo assembly of haplotypes in the regions showing variation.
All the 20 whole genome sequences in fasta format were aligned with reference (NC_045512) using CoVsurver application (https://www.gisaid.org/epiflu-applications/ covsurver-mutations-app/ (accessed on 28 September 2021)) on 15-20 September 2021, collected during the fourth wave of infection. The CoVsurver research tool has been developed with GISAID (Global initiative on sharing all influenza data), aiding researchers to identify and interpret the amino acids (aa) changes in coronavirus genomes. The server rapidly aligns the query genome sequences in fasta format with reference SARS-CoV-2 and screen coronavirus genomes for aa changes to identify any special epidemiological relevance. All mutations in structural proteins of SARS-CoV-2 were separated and arranged in the form of excel sheets. The statistical analysis was performed using EpiData Analysis [24] to analyze various non-synonymous mutations in virus structural and NSP.

Mutations Effect on Virus Structural Proteins
Some of the most common mutations were analyzed for their thermodynamic effect on S protein through DynaMut server [25]. The server implements mutation effect, which can be used to analyze the protein stability and structural flexibility upon point mutation. The server also measures the vibrational entropy changes and the impact of a mutation with graph-based signatures (p-value < 0.001) along with a good resolution of results. To compute the SARS-CoV-2 wild type and mutant protein stability and flexibility, the PDB file of proteins were retrieved from the Protein Data Bank [26] and uploaded to the DynaMut server and a point mutation was inserted at specific site. The impact in the form of total energies and vibrational entropy energies between wild type and mutants was recorded. The high-resolution structures of wilt type and mutant S proteins were retrieved for further processing.
In DynaMut, changes upon point mutation on free energy of protein folding, combining the impacts of mutation stability of protein and also the dynamic properties were computed by ENCoM, Bio3D, and DUET computational servers, generating a more robust predictor of energies.

Phylogenetic Analysis
Genomic sequences in the current study were subjected to MAFT server [27] and Nextstrain [28] for phylogenetic analysis. These are public servers, containing a pipeline for analysis, and visualization, presenting a real-time view into the evolution of viral pathogens.

SARS-CoV-2 Patient Information
Twenty nasopharyngeal swab samples were collected during the fourth wave of the pandemic in September 2021 from SARS-CoV-2 patients (Table 1). Among these samples, 13 were collected from male suspect and seven from female. All the SARS-CoV-2 patients developed clear signs and symptoms, including fever, cough, headache, fatigue, and also significant loss of smell and taste. Eleven patients were found in age category 1 , seven were in category 2 (41-60) and two were 65 years old.

Unique Mutations in Structural Proteins
All the genomic sequences harbored 207 different non-synonymous mutations, including indel, among which 19 were unique to GISAID (S1). All the genomic samples were of GK clade. Mutations E554K (S1 domain of S), K1086Q, and C1250W (S2 domain of S) ( Figure 1) were unique to S protein, and in GISAID, present in samples 1, 11, and 19. The remaining unique mutations in all samples are provided in the Supplementary File S1.

SARS-CoV-2 Variants
All the samples were detected as Delta variants. Mutation details in S and other structures are provided in Supplementary Files S1-S3. These sequences harbored diverse kinds of mutations in the N-terminal and RBD of S protein that may upsurge the immune eva-

SARS-CoV-2 Variants
All the samples were detected as Delta variants. Mutation details in S and other structures are provided in Supplementary Files S1-S3. These sequences harbored diverse kinds of mutations in the N-terminal and RBD of S protein that may upsurge the immune evasion possibility of Delta variant.

Unique Mutations in Structural Proteins
The sequences harbored 129 mutations in NSP in which 13 were unique to GISAID (Table 4)

Unique Mutations in Structural Proteins
The sequences harbored 129 mutations in NSP in which 13 were unique to GISAID (Table 4) (Supplementary File S3). The highest number of unique mutations were detected in NSP3 (N = 10), followed by NSP5 (V86L), NSP12 (F313Y), and NS3 (I118L). The remaining unique mutations in the samples are provided in Supplementary File S3.

Mutations in NSP
The mutation details in NSP are shown in Table 5. In SARS-CoV-2, NSP3 is a large multidomain protein of 1945 amino acids containing a papain-like protease (PLpro) domain (aa746-1060). PLpro, an essential component of the virus replication transcription complex, is highly conserved, present between unique and a nucleic acid-binding domains. Among the unique mutation, NS3_I118L was detected in 14 isolates (Table 4).  The highest frequency of mutations was detected in virus NSP3 (Table 2) in which P1469S (N = 17), A488S (N = 17), and P1228L (N = 17) were the most common. The majority of these mutations were detected in the C-terminal domain (CTD) of NSP3. Deletions with a single frequency in amino acid regions 1500 to 1533 were also observed at CTD. However, the PLpro domain (746-1060) seems highly conserved ( Figure 6). Mutations with a single frequency at position L862F, P822L, and H920Y were detected in the PLpro domain of NSP3 ( Figure 6D). NSP2, which modulates the host cell survival signaling pathway, also harbored six non-synonymous mutations with a very low frequency. NSP4 harbors four mutations in which V167L in NTD and T492I in CTD were present in 17 isolates.
The highest frequency of mutations was detected in virus NSP3 (Table 2) in which P1469S (N = 17), A488S (N = 17), and P1228L (N = 17) were the most common. The majority of these mutations were detected in the C-terminal domain (CTD) of NSP3. Deletions with a single frequency in amino acid regions 1500 to 1533 were also observed at CTD. However, the PLpro domain (746-1060) seems highly conserved ( Figure 6). Mutations with a single frequency at position L862F, P822L, and H920Y were detected in the PLpro domain of NSP3 ( Figure 6D). NSP2, which modulates the host cell survival signaling pathway, also harbored six non-synonymous mutations with a very low frequency. NSP4 harbors four mutations in which V167L in NTD and T492I in CTD were present in 17 isolates.
NSP5 (main protease), harbored three non-synonymous mutations in a very low frequency. Four mutations were detected in NSP6 in which T77A was detected with the highest frequency (N = 17). Virus NSP12, which is also called RdRp (RNA-dependent RNA polymerase), plays an essential role in replication. Mutations NSP12_G671S (N = 18) and NSP12_P323L (N = 16) were the most common, present in the interface and RdRp domain ( Figure 6).  NSP5 (main protease), harbored three non-synonymous mutations in a very low frequency. Four mutations were detected in NSP6 in which T77A was detected with the highest frequency (N = 17). Virus NSP12, which is also called RdRp (RNA-dependent RNA polymerase), plays an essential role in replication. Mutations NSP12_G671S (N = 18) and NSP12_P323L (N = 16) were the most common, present in the interface and RdRp domain ( Figure 6).
A single mutation P77L in NSP13 was detected in all genomic isolates (N = 18). Fortyone mutations were detected in NSP14, including five non-synonymous mutations in which NSP14_A394V (N = 17) NSP14_M72I (N = 6) were the most common.

Discussion
The recent emergence of different SARS-CoV-2 variants posed severe health issues. The α is one of the identified variants that first emerged in the UK and became one of the predominant lineages worldwide. Pakistan reported its first case with α variant in December 2020 [29], triggered S protein, and rapidly spread, leading to the third wave in Pakistan [30]. Molecular epidemiology is essential for better management of viral diagnosis and treatment. However, limited genomic data are available on other genetic lineages from all provinces of Pakistan. Further, frequency of mutations in all target's proteins, including NSP, has not been investigated properly. Surveillance of the population mainly depends on characterizing the differences between reported, emerging, and non-VOC lineages in order to design efficient diagnostic methods, proper therapeutics, and the designing of effective vaccines.
Among the different variants, the Delta variant of SARS-CoV-2 is more deadly and highly transmissible with severe disease signs and symptoms. This variant has been detected in 77 countries as of 24 June 2021 [31,32]. The UK has faced drastic effects from public health measures due to the Delta variant. To prevent the rapid transmission of the virus in the population, there is an urgent need to design more integrated molecular diagnostic systems with the implication for strict rules of quarantines for international travelers.
In the current study, all the genomic samples were detected as Delta variants when analyzed on GISAID CoV-Surver on 15 to 20 September 2021, collected during the fourth wave of infection. Mutations T478K, T19R, L452R, F157del, E156G, P681R, D614G, R158del, and G142D were the most common, present in all genomics isolates. The T478K mutation present in RBD of S protein is involved in interaction with human ACE2 [33,34]. The T478K is unique to the SARS-CoV2 Delta variant, present in the epitope region of potent neutralizing monoclonal antibodies [35]. This point mutation exhibited a structural stabilizing impact on S protein ( Figure 3).
A previous study during the third wave from Pakistan reported that the Delta variant is prevalent in 45% of cases followed by β (46%) [36]. Very little information was provided on mutations in structural proteins while NSP mutations were not provided in detail. In the current study, comprehensive details of mutations in the virus, all targeting proteins, including accessory proteins, have been provided for better understanding of the variations in circulating isolates.
Among the structural proteins, the S protein harbored two mutations in the RBD region (L452R and T478K), four in the NTD (T19R, G142D, ∆156-157 and R158del), and one at the furin-cleavage site (P681R) and S2 region (D950N), detected in 12 virus genome sequences (Table 3). This cluster was observed in Indian isolates in October 2020 [37], exhibiting higher pathogenicity than other variants [38]. The T478K mutation at the RBD region of S protein, fell near the E484K that facilitates antibody escape [39]. The L452R is an antibodyescaping mutation and the virus with this mutant is resistant to convalescent plasma and monoclonal antibody therapy [40]. The furin-cleavage site plays an important role in viral pathogenesis [41] and mutation analysis revealed that L452R and P681R increase the ACE2 binding and transmissibility. Consistent with previous study [42] in which variants harbor different mutations, it exhibited increased binding between S protein RBD region and ACE2 receptor. Similar to this previous study, mutants T19R, E156G, L452R, T478K, and P681R in S protein (Figures 3 and 4) exhibited a stabilizing effect on protein structure, facilitating the binding affinity for more stable interactions with human ACE2. This stability effect may increase the virus transmission in populations.
Mutations Arg158 and Phe-157/del were also detected in all genomic isolates (Table 3), present in NTD of S protein. In a more recent study, E156G, Arg158, and Phe-157/del were found, causing rigidity and reduced flexibility, thus providing fitness advantage and immune escape [43].
The Delta S protein P681R mutation has an important role in the variant replacement of the α-to-Delta. The Delta S P681R mutation present at the furin-cleavage site exhibited stabilizing effects (Figure 4), separating the S1 and S2 regions of S protein. Mechanistically, the P681R mutation of Delta improved the full-length cleavage of S1 and S2, facilitated the virus cell surface entry and increased infection 16. These mutations must be regularly monitored in ongoing pandemics for surveillance and virus severity [44].
Considering the circulation of the Delta virus with specific mutations and a high transmissible rate, the Pakistani national health authority needs to take timely measures against it. Further, molecular epidemiological studies with insight into genomic analysis of the virus are needed to screen the most prevalent variants and specific mutations that may affect the diagnosis and vaccine potency, to adopt potential measures in the future.
Emerging mutations in NSP may affect the virus transmission and pathogenicity. Recently, a synonymous mutation (F106F) in NSP3 along with other signature mutations exhibited a virus fitness effect [45]. In CoV-2 NSP3, the PLpro domain is a large protein and an essential member of replication transcription complex [46,47]. PLpro domain has a catalytic domain (Figure 1) for cleavage activity. Mutations in this domain may affect a catalytic process of PLpro. In the current study, three mutations were detected in this domain with many unique to GISAID. The effect of these mutations may be investigated in future studies for better designing of inhibitors against PLpro.
The NSP6_L37F mutation, which was very common in the earlier infections, were associated with asymptomatic cases. NSP6 reduces autophagic ability, important for viral infections and promotes cell death [48]. Mutation NSP6_T77A in the current study ( Table 2) seems emerging in the current wave. However, its effect on virus severity is needed to be explored for better management of COVID-19.
Previously, mutation P323L in RdRp was the most common in Pakistani isolates [49]. This mutation is still present in the Delta variants (Table 2), which has a stabilizing effect, while interacting with viral RNA, which lies in the interface domain of RdRp. This domain is the antiviral (filibuvir, Simeprevir) binding site [43], which may show weak interaction if mutations emerge. Mutation G671S in NSP12 detected in the current study has been observed as emerging, present in all 18 isolates, increasing the stability of the protein [50]. However, further investigations are needed to see its effect on virus pathogenicity. A single mutation, NSP13_P77L, which is characteristic only in Delta variants, was detected in all genomic sequences in the current study, and may have a destabilizing effect [51]. NSP14_A394V is the most common mutation in NSP14 (Table 2), and has a role in one of the positive section sites of virus [52].
Vaccine efficacy may be reduced in previous VOCs; however, BNT162b2 retained its potency against β VOC [53]. Omicron harbored a number of known and unique mutation patterns (Supplementary Files) as compared to other VoCs; therefore, the vaccines which are effective against Omicron infections are not clear. Vaccines have been found to be effective against other VOCs. Observational investigation in Qatar and Kaiser Permanente [54,55] found more than 90% vaccine efficacy against the Delta-variant. Data from New York indicates a good efficacy for individuals 65 years and older, exhibiting different levels of efficacy for different vaccines [56].
In conclusion, Delta variants with some unique mutations, are circulating during the fourth wave of the pandemic, in major cities in the Punjab province of Pakistan. The structural proteins and NSP harbored numerous mutations, present in different functional domains. Mutations spike_K1086Q, spike_E554K, and spike_C1250W were unique to GI-SAID. The most common mutations in the S protein exhibited a stabilizing effect, facilitating the binding affinity for S protein RBD with human ACE2. Geographic specific vaccines and drugs may be designed for better management of COVID-19 in the future. The geo-climate distribution of the mutations may decipher higher uniqueness and disease severity in underdeveloped countries, including Pakistan. The effect of these mutations on virus pathogenicity may be experimentally verified to access the effect on virus severity and vaccine efficacy. Continuous molecular epidemiology is required to screen geographic-specific mutations for better understanding of the virus pathogenicity, diagnosis, and treatment of COVID-19. Informed Consent Statement: Written informed consent has been obtained from the patient(s) to publish this paper.

Conflicts of Interest:
The authors declare no conflict of interest.